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Abstract 

We study a scheduling problem in which jobs may be split into parts, where the parts of a split job may be 
processed simultaneously on more than one machine. Each part of a job requires a setup time, however, on 
the machine where the job part is processed. During setup a machine cannot process or set up any other 
job. We concentrate on the basic case in which setup times are job-, machine-, and sequence- independent. 
Problems of this kind were encountered when modelling practical problems in planning disaster relief 
operations. Our main algorithmic result is a polynomial-time algorithm for minimising total completion 
time on two parallel identical machines. We argue why the same problem with three machines is not an 
easy extension of the two-machine case, leaving the complexity of this case as a tantalising open problem. 
We give a constant-factor approximation algorithm for the general case with any number of machines and 
a polynomial-time approximation scheme for a fixed number of machines. For the version with objective 
minimising weighted total completion time we prove NP-hardness. Finally, we conclude with an overview 
of the state of the art for other split scheduling problems with job-, machine-, and sequence-independent 
setup times. 

1 Introduction 

We consider a scheduling problem with setup times and job splitting. Given a set of identical parallel 
machines and a set of jobs with processing times, the goal of the scheduling problem is to schedule the jobs 
on the machines such that a given objective, for example the makespan or the sum of completion times, is 
minimised. With ordinary preemption, feasible schedules do not allow multiple machines to work on the 
same job simultaneously. In job splitting, this constraint is dropped. Without setup times, allowing job 
splitting makes many scheduling problems trivial: both for minimising make-span and for minimising total 
(weighted) completion time, an optimal schedule is obtained by splitting the processing time of each job 
equally over all machines, and processing the jobs in arbitrary order on each machine in case of makespan, 
and in (weighted) shortest processing time first ((W)SPT) order in case of total (weighted) completion time. 

*This research was partially supported by Tinbergen Institute. 
tResearch supported by EU-IRSES grant EUSACOU. 
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See Xing and Zhang [T^] for an overview of several classical scheduling problems which become polynomially 
solvable if job splitting is allowed. 

In the presence of release times, minimising total completion time with ordinary preemption is NP-hard 
[5], whereas it is easy to see that if we allow job splitting, then splitting all jobs equally over all machines 
and applying the shortest remaining processing time first (SRPT) rule gives an optimal schedule. 

Triviality disappears when setup times are present, i.e., when each machine requires a setup time before 
it can start processing the next job (part). During setup, a machine cannot process any job nor can it set 
up the processing of any other job (part). Problems for which the setup times are allowed to be sequence- 
dependent are usually NP-hard, as such problems tend to exhibit routinglike features. For example, the 
Hamiltonian path problem in a graph can be reduced to the problem of minimising the makespan on a single 
machine, where each job corresponds to a node in the graph, the processing times are 1, and the setup time 
between job i and j is if the graph contains an edge between i and j, and 1 otherwise. However, as we will 
see, adding setup times leads to challenging algorithmic problems, already if the setup times are assumed to 
be job-, machine-, and sequence- independent. 

We encountered such problems in studying disaster relief operations. For example in modelling flood relief 
operations, the machines are pumps and the jobs are locations to be drained. Or in the case of earthquake 
relief operations, the machines are teams of relief workers and the jobs are locations to be cleared. The 
setup is the time required to install the team on the new location. Although, in principle, these setup times 
consist partly of travel time, which are sequence-dependent, the travel time is negligible compared to the 
time required to equip the teams with instructions and tools for the new location. Hence, considering the 
setup times as being location- and sequence-independent was in this case an acceptable approximation of 
reality. 

In this paper we concentrate on a basic scheduling problem and consider the variation where we allow 
job splitting with setup times that are job-, machine-, and sequence- independent, to which we will refer 
here as uniform setup times; i.e., we assume a uniform setup time s. There exists little literature on this 
type of scheduling problems. The problem of minimising makespan on parallel identical machines is in the 
standard scheduling notation of Graham et al. [B] denoted as P||Cmax (see Section [5] for an instruction on 
this notation). This problem P||C„iax, but then with job splitting and setup times that are job-dependent, 
but sequence- and machine-independent, is considered by Xing and Zhang [12], and Chen, Ye and Zhang [3]. 
Chen et al. [3] mention that this problem is NP-hard in the strong sense, and only weakly NP-hard if the 
number of machines is assumed constant. Straightforward reductions from the 3-Partition and Subset 
Sum problem show that these hardness results continue to hold if setup times are uniform. Chen et al. 
provide a 5/3-approximation algorithm for this problem and an FPTAS for the case of a fixed number 
of machines. A PTAS for the version of P||Cniax with preemption and job-dependent, but sequence- and 
machine-independent setup times was given by Schuurman and Woeginger [lOj . It remains open whether a 
PTAS exists with job splitting rather than preemption, even if the setup times are uniform. See [8] and [9] 
for a more extensive literature on problems with preemption and setup times. 

Our problem is related to scheduling problems with malleable tasks. A malleable task may be scheduled 
on multiple machines, and a function fj{k) is given that denotes the processing speed if j is processed on k 
machines. If k machines process task j for L time, then fj{k)L units of task j are completed. What we call 
job splitting is referred to as malleable tasks with linear speedups, i.e., the processing time required on k 
machines is 1/fc times the processing time required on a single machine. We remark that job splitting with 
setup times is not a special case of scheduling malleable tasks, because of the discontinuity caused by the 
setup times. We refer the reader to Drozdowski f4| for an extensive overview of the literature on scheduling 
malleable tasks. 

The main algorithmic result of our paper considers the job splitting variant of the problem of minimising 
the sum of completion times on identical machines, with uniform setup times: given a set of m identical 
machines, n jobs with processing times pi, . . . ,p„, and a setup time s, the objective is to schedule the jobs 
on the machines to minimise total completion time (^ Cj ) (where the chosen objective is inspired by the 
disaster relief application). The version of this problem with ordinary preemption and fixed setup time s 
is solved by the Shortest Processing Time first rule (SPT); the option of preemption is not used by the 
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optimum. However, the situation is much less straightforward for job sphtting. If s is very large, then an 
optimal schedule minimises the contribution of the setup times to the objective, and a job will only be split 
over several machines if no other job is scheduled after the job on these machines. It is not hard to see 
that the jobs that are not split are scheduled in SPT order. If s is very small (say 0), then each job is 
split over all machines and the jobs are scheduled in SPT order. However, for other values of s, it appears 
to be a non-trivial problem to decide how to schedule the jobs, as splitting a job over multiple machines 
decreases the completion time of the job itself, but it increases the total load on the machines, and hence 
the completion times of later jobs. 

Consider the following instance as an example. There are 3 machines and 6 jobs, numbered 1, 2, . . . , 6, 
with processing times 1,2,3,5,11,12, respectively, and setting up a machine takes 1 time unit. One could 
consider filling up a schedule in round-robin style, assigning the jobs to machine 1, 2, 3, 1, 2, 3, respectively. 
This schedule is given in the Gantt chart in Figure 1(a) The schedule has objective value 49. 
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Figure 1: Gantt charts depicting the schedules for the instance described in Section [2l The grey blocks 
indicate the setup times, the numbered blocks are scheduled job parts. Each row of blocks gives the schedule 
for a machine. 



By splitting job 6 over machines 1 and 3, instead of processing it on machine 3 only, we can lower the 
completion time of job 6, and this improves the objective value since there are no jobs scheduled after job 
6. In fact, to get the best improvement in objective value, we make sure that both job parts of job 6 finish 
at the same time, see Figure 1(b) The objective value is of the schedule is 45. 



Splitting jobs early in the schedule, may increase the objective value, as (many) later jobs may experience 
delays. For example, if we choose to split job 2 over machines 2 and 3, we will cause delays for jobs 3 and 
6, while improving the completion times of jobs 2 and 5. If we require that job parts of the same job end at 
the same time, we get the schedule pictured in Figure 1(c) with objective value 46. Figure 1(d) depicts the 
optimal schedule with objective value 40. 

This illustrates the inherent trade-off in this problem mentioned earlier: splitting jobs will decrease the 
completion times of some jobs, but it also may increase the completion times of other jobs. 

In Section [3] we present a polynomial-time algorithm for the case in which there are two machines. The 
algorithm is based on a careful analysis of the structure of optimal solutions to this problem. Structures of 
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optimal solutions that hold under any number of machines are presented in a preliminary section. Though 
a first guess might be that the problem would remain easy on any fixed number of machines, we will show 
by some examples in Section 2] that nice properties, which make the algorithm work for the 2-machine case, 
fail to hold for three machines already. The authors are split between thinking that we have encountered 
another instance of Lawler's "mystical power of twoness" [71, a phrase signifying the surprisingly common 
occurance that problems are easy when a problem parameter (here the number of machines) is two, but 
NP-hard when it is three, or that we just lacked the necessary flash of insight to find a polynomial-time 
algorithm. We present a constant-factor approximation algorithm for the general case with any number of 
machines in Section [5] and in Section [5] we give a polynomial-time approximation scheme for the case of 
a fixed number of machines. We leave the complexity of the problem (even for only three machines) as a 
tantalising open problem for the scheduling research community. We show in Section [7] that introducing 
weights for the jobs makes the problem NP-hard, already on 2 machines. We finish the paper by giving a 
table with the state of the art for other split scheduling problems with uniform setup times. We summarize 
whether they are known to be NP-hard or in P, and present the best known approximation ratios. 

2 Preliminaries 

An instance is given by m parallel identical machines and n jobs. Job j has processing time pj, for j = 
I, . . . ,n. Each job may be split into parts and multiple parts of the same job may be processed simultaneously. 
Before a machine can start processing a part of a job, a fixed setup time s is required. During setup of a job 
(part) the machine cannot simultaneously process or setup another job (part). The objective is to minimise 
the sum of the completion times of the jobs (total completion time) , which is equivalent to minimising the 
average completion time. 

Here we derive some properties of an optimal schedule, which are valid for any number of machines. 
Some additional properties for the special case of two machines, presented in Section [31 will lead us to a 
polynomial-time algorithm for this special case. We show in Section [H that the additional properties that 
make the 2-machine case tractable do not hold for the case of three machines. 

Lemma 1. Let a be a feasible schedule with job completion times Ci < C2 < ■ ■ • < C„. Let a' be obtained 
from a by rescheduling the job parts on each machine in order 1, 2, . . . , n. Then C'j < Cj for j = 1, . . . , n. 

Proof. Let be the time that j is processed on machine i in a and let Cij be the time that j finishes 
on machine i. Let yij = s + qij if > an let yij = otherwise. Fix some job j and machine i. Let 
k = argmaxjCik | 1 < k < j}. Then Cj > Ck > Cik > Y^h=iVih — ^ij^ where the first inequality is by 
assumption and the last one by the fact that all work on jobs smaller than or equal to j has been done on 
machine i at time Cik- Since Cj > C[j for any machine i on which j is scheduled the proof follows. □ 

The lemma above has several nice corollaries. First, note that if in an optimal schedule Ci < C2 < 
• • • < Cm then we maintain an optimal schedule with the same completion time for each job by scheduling 
the job parts on each machine in the order 1, 2, . . . , ri. This allows to characterize an optimal schedule by 
a permutation of the jobs and the times that job j is processed on each machine i. The optimal schedule 
is then obtained by adding a setup time s for each non-zero job part and processing them in the order of 
the permutation on each machine. Consequently, in the optimal schedule obtained each machine contains at 
most one part of each job. 

In the sequel, given a schedule, we use Mj to denote the set of machines on which parts of job j are 
processed. We will sometimes say that a machine processes job j, if it processes a part of job j. 

Lemma 2. There exists an optimal schedule such that on each machine the job parts are processed (started 
and completed) in SPT order of the corresponding jobs. 

Proof. We consider an optimal schedule of the form of Lemma 1, and among such schedules, we choose the 
schedule that minimizes J^jPj^j- 
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Wc claim that for any two jobs j, k with pj < pk and Cj > Ck that arc processed on the same machine, 
the machine must process j before k. In this case Mj (1 ^ 0. We define a new schedule by rescheduling 
both jobs within the time slots these jobs occupy in the current schedule (including the slots for the setup 
times). First remove both jobs. Then consider the machines in Mk one by one, starting with the machines 
in Mk\Mj and fill up the slot previously used by job k, until we have completely scheduled job j including 
the setup times. This is possible since pj < pk- We will show that job k can be scheduled in the remaining 
slots. 

Let Mj and Mj^ denote the set of machines occupied by j and k, respectively, in the new schedule. We 
distinguish two cases. If job j cannot be rescheduled completely in the slots used by k in Mk \ Mj then we 
have Mj, C Mj. Together with Mj C Affe it follows that (Mj n M^) C (M,- n Mk). Hence, any machine 
containing both j and k in the new schedule did also contain both jobs in the old schedule. Hence, there are 
no extra setups on any machine needed. 

Now consider the case that job j is rescheduled completely in the slots used by k in Mk \ Mj. Then, 
after adding job fc, the total number of setups needed for j and k does not increase since there is at most 
one machine of Mk \ Mj containing both jobs in the schedule, but none of the machines in Mj fi Mk is used 
in the new schedule. 

Let C" denote the new completion times. We have Cj < Ck and < max{ Cj,Cfe}, since in the new 
schedule j is processed only where job k was processed in the old schedule, and job k is processed in the new 
schedule only where either job j or job k was processed in the old schedule. For all other jobs, the completion 
time remains the same. Now, by assumption, we have that Cj > Ck, and hence < Cj. Therefore, the sum 
of completion times did not increase, and J2ePi^e < Y^iPt^t^ which contradicts the choice of the original 
schedule. 

Hence, if there exist jobs j, k such that pj < pk and there exists some machine i that processes job k 
before j, then it must be the case that Cj < Ck. If such jobs j, k exist, then there also exist such jobs j, k 
and a machine i such that k is processed immediately before j on machine i. Now, reversing these parts in 
the schedule does not increase the completion of jobs j and k and does not effect the schedule of the other 
jobs. Moreover, we decrease the number of triples k such that pj < pk and machine i processes k before 
j. Repeating this procedure gives an optimal schedule that satisfies the lemma. □ 

From now on, assume that jobs are numbered in SPT order, i.e., pi < ■ ■ ■ < Pn- Given a schedule, we 
call a job balanced if it completes at the same time on all machines on which it is processed. 

Lemma 3. There exists an optimal schedule in which all jobs are balanced. 

Proof. Consider an optimal schedule with a minimum number of job parts. Let C* be the completion time 
of j in this schedule and define Mj for this schedule as before. Consider the following linear program in 
which there is a variable Xij for all pairs i,j with i e Mj, indicating the amount of processing time of job j 
assigned to machine i: 



min 




s.t. ^ Xij =pj, Vj = 1, . . . , n 

is + Xik)<Cj, \/j = l,...,n, \/i€ Mj 

k<j: ieMk 

Xij > 0, Cj > 0, Vi = 1, . . . , n, Vi e Mj 

Note that a schedule that satisfies Lemmas 1 and 2 gives a feasible solution to the LP, and on the other 
hand that any feasible solution to the LP gives a schedule with total completion time at most the objective 
value of the LP: if there exist some j and i e Mj such that Xij = 0, then the LP objective value is at least 
the total completion time of the corresponding schedule, as there is no need to set up for job j on machine 
i if Xij = 0. We know that a solution is a basic solution to this LP, only if the number of variables that are 
non-zero is at most the number of linearly independent tight constraints (not including the non-negativity 
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(1) 



constraints). By the minimahty assumption on the optimal schedule, in any optimal solution to the LP 
all Cj and Xij variables are non-zero, which gives a total of n -I- |Mj| variables. Since there are only 
n + |Mj| constraints, all constraints must be tight, which proves the lemma. □ 

3 An 0(n log n) time algorithm for two machines 

Given a feasible schedule, we call a job j a c?-job, if \Mj \ — d. In this section we assume that the number of 
machines is two. 

Lemma 4. Let <j he an optimal schedule for a 2-machine instance and let j < k be two consecutive 2-jobs. 
If there are 1-jobs between j and k, then there is at least one 1-job on each machine. Also, the last 2-job is 
either not followed by any job or is followed by at least one 1-job on each machine. 

Proof. Let j and k be two consecutive 2-jobs and assume there is at least one in-between 1-job on machine 

1 and none on machine 2. Let si,S2 be the start time of job j on respectively machine 1 and 2. We may 
assume without loss of generality that si > S2. otherwise we just swap the schedules of the two machines 
for the interval [0,(7^] and get the inequality. We change the schedule of j and k and the in-between 1-jobs 
as follows. Job j is completely processed on machine 2, starting from time S2, and the in-between 1-jobs are 
moved forward such that the first starts at time si . Since pj < pk, and the part of job j we moved to machine 

2 is at most ^pj, whereas the part of job k that was on machine j is at least ^pk, we can still schedule job 
k such that its completion time remains the same. The total completion time is reduced by at least s since 
j needs only one setup time now. If j is the last 2-job then we can make the same adjustment. □ 

Lemma 5. In the case of two machines there are no 1-jobs after a 2-job in an optimal schedule satisfying 
the properties of Lemmas{ll [H and\^ 

Proof. Suppose the lemma is not true. Then there must be a 2-job j that is directly followed by a 1-job. By 
Lemma |4l there must be at least one such 1-job on each machine, say jobs h and k. Assume without loss 
of generality that ph < Pk- Let xij,X2j be the processing time of j on machine 1 and 2, respectively. As 
argued before, without loss of generality we assume that Xji > Xj2. Let us define the starting time oi j as 
zero, and let A = xij — X2j. Note that Cj = ^(A -I- 2s). Then, the sum of the three completion times is 

Cj + Ch + Ck = Cj + {Cj + Ph + s) + Cfe 

= A+p-i +2s+ph + s + Ck. (2) 

We reschedule the jobs j, h, k as follows, the remaining schedule stays the same. Place job j, the shortest 
among j, k, h, on machine 1 (unsplit), job h on machine 2 (unsplit), and behind these two, job k is split on 
machine 1 and 2, in such a way that it completes on one machine at time Ch and time Ck on the other. The 
sum of the completion times of the three jobs becomes 

{Pj+ s) + {A+ph + s)+Ck, 

which is exactly s less than the sum of the three completion times before the switch in ([2]) . □ 

Given the previous lemmas, we see that the 2-jobs are scheduled in SPT order at the end. By Lemma 
[2l the first 2-job, say job fc, is not shorter than the preceding 1-jobs. But this implies that the 1-jobs can 
be scheduled in SPT order without increasing the completion time of job k and the following jobs. By 
considering each of the n jobs as the first 2-job, we immediately obtain a O(n^) time algorithm to solve the 
problem. Carefully updating consecutive solutions leads to a faster method. 

Theorem 1. There exists an 0(n\ogn) algorithm for minimising the total completion time of jobs on two 
identical parallel machines with job splitting and uniform setup times. 
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Proof. Suppose we schedule the first k jobs (for any 1 < fc < n) in SPT order as 1-jobs and the other jobs 
in SPT order as 2-jobs. We would like to compute the change in objective value that results from changing 
job k from a 1-job to a 2-job. However, this happens to give a rather complicated formula. It is much easier 
to consider the change for job k — 1 and k simultaneously. 

The schedule for the 1-jobs j < k — 1 does not change. To facilitate the exposition, suppose that job k — 1 
starts at time zero and job k starts at time a. Then Ck-i + Ck = Pk-i + s + a + pk + s. After changing the 
jobs to 2-jobs, the new completion times become C^_i = {a+pk-i + 2s)/2 and C(, — {a + pk-i +pfe + 4s)/2. 
Hence, 

+C'f,- Ck-i - Ck = s - Pk/2. 

In addition, each job j > k completes s time units later. Hence, the total increase in objective value due to 
changing both job k — 1 and k from a 1-job to a 2-job is 

f{k) := in-k + l)s-pk/2. 

Notice that /(fc) is decreasing in k, since s > and pk is nondecreasing in k. Hence, either there exists some 
fc G {2, . . . , n} such that /(fc) < and f{k - 1) > 0, or either f{n) > 0, or /(2) < 0. 

Suppose there exists some k G {2, . . . ,n} such that f{k) < and f{k — 1) > 0. The optimal schedule is 
to have either fc — 1 or fc — 2 unsplit jobs, since the first inequality and monotonicity implies that a schedule 
with fc — 2 unsplit jobs has a better objective value than a schedule with k or more unsplit jobs, and the 
second inequality and monotonicity implies that a schedule with fc — 1 unsplit jobs has a better objective 
value than a schedule with fc — 3 or fewer unsplit jobs. 

If f{n) > then the optimal solution is either to have only 1-jobs or have only job n as a 2-job. If 
/(2) < then the optimal solution is either to have only 2-jobs or have only job 1 as a 1-job. 

Straightforward implementation of the above gives the desired algorithm, the running time of which is 
dominated by sorting the jobs in SPT order. □ 



4 Troubles on more machines 

The properties exposed in Section [5] have been proved to hold for any number of machines. The properties 
presented in Section [3] are specific for two machines only. In this section we investigate their analogues for 
three and more machines. We will present some examples of instances that show that the extension is far 
from trivial. It keeps the complexity of the problem on three and more machines as an intriguing open 
problem. 

Consider the instance on three machines having 10 jobs with their vector of processing times p ~ 
(3, 10, 10, 10, 10, 50, 50, 50, 50, 50) (1 small job, 4 middle size jobs and 5 large jobs) and s = 0.7. An op- 



timal solution is depicted in Figure 2(a) As we see, job 2 is split over machines 2 and 3, but job 3 starting 
later than job 2 is not split. Jobs 4 and 5 are again what we call 2-jobs and are split over machines 2 and 3. 
The large jobs are all split over all three machines. 

We will consider two solutions to be the same, if one solution can be obtained from the other by a 
relabelling of machines, and/or (repeatedly) swapping the schedule of two machines from some time t till 
the end of the schedule, if these two machines both complete processing of some job at time t. Also we will 
assume without loss of generality, that all processing times are distinct, by slightly perturbing the processing 
times if necessary, obtaining pj < Pj+i for all j = 1, 2, . . . , n — 1. 



The solution depicted in Figure 2(a) is not the unique optimal solution for this instance. The Gantt 
charts of the other three optimal solutions are given in the other three subfigures of Figure [21 However, all 
optimal solutions for this instance share the property that job 2 is a 2-job, and either job 3 or job 4 is a 
1-job. From this example we see that an optimal schedule does not necessarily have the property that \Mj\ 
is monotone in j. 

We now describe the other three optimal solutions for this instance. The second optimal schedule, in 



Figure 2(c) is obtained by scheduling job 1 on machine 1, job 2 split on machine 2 and 3, job 3 on machine 



2 (or 3), and jobs 4 and 5 as split jobs on the machines not used by job 3. The remaining jobs are again all 
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Figure 2: Gantt charts depicting the optimal sohitions to the 3- machine instance with processing times 
p = (3, 10, 10, 10, 10, 50, 50, 50, 50, 50) (1 small job, 4 middle size jobs and 5 large jobs) and s = 0.7. The 
grey blocks indicate the setup times, the numbered blocks are scheduled job parts. Each row of blocks gives 
the schedule for a machine. 



1 


3 


4 


5 


6 


7 


8 


9 




2 


3 


5 


6 


7 


8 


9 




2 


4 


5 


6 


7 


8 


9 





Figure 3: Gantt chart depicting the unique optimal solutions to the 3-machine instance with processing 
times p = (3, 10, 10, 10, 10, 50, 50, 50, 50) (1 small job, 4 middle size jobs and 4 large jobs) and s = 0.7. 



split on all three machines. It is easily verified that the objective of this schedule is the same as the objective 
of the schedule in Figure 2(a)[ the completion time of job 3 increases by 2, and the completion times of jobs 
4 and 5 each decrease by 1, and all other completion times remain the same. The remaining two optimal 
schedules, in Figures 2(b) and |2(d)] are obtained by switching jobs 3 and 4. We note that this continues to 
hold if their processing times differ by some small value. 

If we slightly change the instance by deleting one of the large jobs, then there is a unique optimal solution, 
which splits job 3 over machines 1 and 2 and continues with splitting job 4 over machines 1 and 3. Job 5 
and the four large jobs are split over all three machines, see Figure [31 

That such a subtle change causes such a substantial change in the optimal schedule bodes ill for an 
algorithmic approach like the one in Section [3l These examples do not rule out that there exists an optimal 
schedule in which the jobs are started (or finished) in SPT order. A proof of either of these properties has 
proved elusive, however. 
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5 Approximation algorithm 



We will now show a constant-factor approximation algorithm for our problem, for an arbitrary number of 

machines. We remark that we do not know whether this problem is NP-hard, but the examples in the 
previous section do show that the way a job is scheduled in an optimal schedule may depend on jobs that 
occur later in the schedule. Our approximation algorithm, on the other hand, is remarkably simple, and 
only uses a job's processing time and the setup time to determine how to schedule the job. 

We schedule the jobs in order of non-decreasing processing time. Let s > and let a be some constant 
that will be determined later. Job j will be scheduled such that it completes as early as possible under the 
restriction that it uses at most £j := min{ [apj/.s] , m} machines. Thus, the job will be scheduled on the 
at most £j machines that have minimum load in the schedule so far. It is easy to see that a job is always 
balanced this way. 

Theorem 2. The algorithm described above is a {2 + a) -approximation algorithm for minimising the total 
completion time with job splitting and uniform setup times, provided that a > j{\/l7 — 1). 

Proof. Let a be the schedule produced by the described algorithm. Note that the total load (processing times 
plus setup times) of all jobs in a up to, but not including, job j is upper bounded by Lj = J2k<jiPk + ^^ks), 
since job k introduced at most £k setups. Therefore, the average load on the Ij least loaded machines is 
upper bounded by Lj/m. Since job j is balanced, we can thus upper bound the completion time Cj of job 
j in the schedule by Lj/m + Pj/£j + s. Note that this is an upper bound on the completion time of job j 
when we try to schedule it on at most £j machines. 
Noting that 

Pj 1^3 = Pj I min{ \apjls\ , m} 
< {l/a)s+pj/m, 

and 

4s = min{ \apk/ s] , m}s < apk + s, 

we obtain 

Cj < Lj/m + pj/lj+s 

< —y^(Pk + hs)+Pj/ij + s 

k<j 

< — ^ ((1 + a)pk + s)+ pj/m + (1 + l/a)s 

k<j 

< — y^Pk+i- — +1+- u. 

m f-^ Km. a J 

k<j ^ ^ 

We can lower bound the sum of completion times in an optimal schedule by + ^ Sfc<j P^)'- suppose 
we only needed a setup time for the first job to be processed on a machine, for any machine. Clearly, the 
optimal sum of completion times for this problem gives a lower bound on the optimum. Now, the optimal 
schedule when we only need a setup time for the first job on a machine processes the jobs in SPT order and 
splits each job over all machines, which gives a sum of completion times of + ^ ^k<'j Pk)- 

Also, in any schedule, at most m jobs are preceded by only one setup, at most another m by two setups, 
etc., giving a lower bound of ^j\j/rn\s on the sum of completion times: this is exactly the optimal value 
when all processing times are 0. We will show below that Y]. s > JZi + ins. 
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Hence, by using 1 + a times the first bound, and 1 time the second bound, we get 



(2 + a) E,Q>(l + a)E, 



* + m ^k<jPk 



which is at least as large as Ej Cj provided a > and § + Q; > 1 + ^, which is equivalent to a > ■|(vT7— 1). 
Next we show that Ej [ml * — ■'^^^ ^ ~ ^™ "I" some g > and a G {1, . . . , m}. 



Then 



i-1 



(g + 1) — (gm + a — l)/m 
1 - (a - l)/m. 



Now assume that n = rm + b, for some integer r > and b G {1, . . . , m}. Then 



n r ■ ~\ 

J 



E 



E 



J-1 



m 



o=l 



a- 1 



m 



o- 1 



m 



X ^ a — 1 X ^ a— 1 
rm + — r > > 



1 



a=l 



m 



= n — r(m — l)/2 — -(6 — l)6/m 

> n-r(m- l)/2- ^(6- 1) 

= n- (rm + 6)/2 + r/2 + 1/2 

= n/2 + r/2 + 1/2 > n/2. 



Hence multiplying both sides with s yields 

n 

E 

□ 

CoroUciry 1. There exists a 2+^(\/T7— 1) < 2.781 -approximation algorithm for minimising total completion 
time with job splitting and uniform setup times. 



s > 



E 



- - 1 1 

s -\ — ns. 

m 2 



6 A polynomial-time approximation scheme 

We give an approximation scheme which runs in polynomial time if the number of machines is assumed 
constant. The idea is simple: by splitting a job j, at most pj on its completion time can be saved. It is 
easy to show that the value of a non-preemptive SPT schedule is no more than Ej Pj larger than Opt. In 
particular, if wc schcdiilc the first K = n — m/e jobs by non-prccmptivc SPT then the extra cost is at most 
EjLift- But, as we will see, this is only an e-fraction of the total completion time of the last m/e jobs. 
These last jobs we schedule optimally given the schedule of the first K jobs. 

Now, we define the algorithm and its running time in more detail. Let. as before, Pi < ■ • ■ < Pn- Let 
K = n — m/e. (If if < then n < m/e and the optimal solution can be found in constant time.) Assume that 
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K is integer. Let p be an optimal schedule and let p{K) be the schedule p restricted to the jobs 1,2, . . . ,K . 
By Lemma [2] we may assume that p{K) has no idle time. Let ti{p) be the completion time of machine i 
in p{K). The algorithm makes an approximate guess about the values ti{p). That means, it finds values ti 
such that 

k{p)<U<U(p) + s + pK. (3) 

Note that for any i, we have ti{p) < K{s + pk)- Hence, we need to try only guesses for (ii, . . . , 
Assume from now that we guessed (ii, . . . correctly, i.e., ^ is satisfied. 

We apply SPT to the jobs 1,2, ... ,K such that no machine i is loaded more than ti+s+pK- This can easily 
be done as follows: apply list scheduling in SPT order and close a machine once its load becomes ti or more. 
Let Ti be the completion time of machine i in the resulting schedule. Then Ti < ti + s+pK < ti{p) + 2(s-\-pK)- 
Next, we find a near-optimal completion of the schedule by guessing for each job j > K a. set Mj and apply 
linear programming. There are 2™("~^) possibilities for choosing such sets, which is a constant. The linear 
program works as follows. Note that the LP of Section [2] can be extended to do the following. Given a set 
Mj for each job j and a time Tj for each machine, we can find the optimal schedule among all schedules for 
which: (i) job parts are in SPT order on each machine, (ii) machine i does not start before Ti, (iii) job j can 
only be scheduled on machines in Mj, and (iv) job j has a setup time s for each machine in Mj even when 
its processing time Xij is zero. Note that it is not clear if the LP gives us the real optimal completion since 
we have not proved that the SPT properties hold also for optimal schedules if an initial part is fixed, as we 
do here. However, we can show that the solution given by the LP is close to optimal. 

Approximation ratio Let a be the final schedule and let Cj be the completion time of job j. Here we use 
Opt to denote the objective value of optimal schedule p. For any h G {1, . . . ,n} define p.^ ~ X]fc=i(*+-Pfc)/™- 
Then for any schedule, the h-th completion time is at least ph- Hence, 

n n n 

Opt > ^ph> f^'^ - X! = {fn/e)pK 

h=l h=K+l h=K+l 

1 ^ 



e 

k=l 



Further on in the proof we will use C'h as the completion time of job h in the optimal schedule. Here we 
use the notation C'-''^ for the h-th completion time ol p, {h — 1, . . . ,n). Notice, that C*^'*'' is not necessarily 
equal to Ch, the optimal completion time of job h in p. For h < K it is easy to see that < C^^^ + s +ph. 
This implies 



K K 



h=l h=l 

K K 

h=l h=l 
K 

< ^C'C'^+eOPT. 

h=l 

So for the first K jobs we are doing fine. Next we give a bound on the total completion time of the other 
jobs. 

Let Mj be the set of machines used by job j in the optimal schedule p. One of the guesses of the algorithm 
will be Mj = Mj for j > K. We show that the corresponding LP-solution gives a near-optimal completion 
of the schedule. 

A feasible LP solution is to take for Xij, j > K, the values that correspond to p and take choose values 
Cj'^ — Cj + 2(s -\- Pk), where we we remind that Cj is the completion time of job j in the optimal schedule 
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p. The latter is feasible since Tj < ti{p) + 2(s + pk)- Hence, we can bound the total completion times of 
jobs if + 1, . . . , n by 



h = K+l h=K+l 



n 

< {Ch + 2{s+pK)) (5) 



n 

= Ch + 2(n-K){s+pK). 

h=K+l 

To bound the second term in the right hand side of ([S]) we derive another bound on Opt: 

n n 

Opt > ^ A^ft > Y 

h=l h=K+l 
n h 

h=K+l k=l 
n h 

> X! X! is+Pk)/m 

h=K+l k=K+l 
n 

> Y {h- K){s+pK)/m 

h=K+l 

> K)^{s+pK)/m 
= l{n-K){s+pK)/e. 

Combining this with ([S} we get 

n n n 

h=K+l h=K+l h=K+l 

Adding Q we can bound the total completion time by (1 + 5e)OPT. 

7 Hardness for weighted completion times 

We prove that introducing weights for the jobs in our problem makes it strongly NP-hard for any number 
of machines and weakly NP-hard for 2 machines. 

Theorem 3. The problem of minimising total weighted completion time with job splitting and uniform setup 
times on parallel identical machines f^'js, splitj ^ wjCj j is strongly NP-hard. 

Proof. We reduce from 3-Partition: given Zn positive numbers ai,...,a3„ and a number A such that 
ai + - • • + a3n = nA, does there exists a partition Ai, . . . , An of {1, . . . , 3n} such that \Ai\ = 3 and X^jeA^ % ~ 
A for all il Given an instance of 3-Partition, we construct the following instance of our scheduling problem: 
We have n machines and in jobs. We set Pj — aj and Wj = aj + s for all j = 1, . . . , 3n, where the setup time 
s is some large enough number, to be defined later. 

The idea behind the reduction is the following: the large setup time will make sure that exactly three 
jobs are scheduled (unsplit) per machine. The weights are chosen so that a schedule where all machines 
complete at exactly the same time is optimal, if such a schedule is feasible. 
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Suppose wc schedule the jobs unspUt where Ai is the set of jobs processed on machine i. Then, the cost 
of the schedule is: 



3n 



3 = 1 i=l jeAi 



i=l j^Ai k<j 
n 

j=l jeAi k<j 



1 " 

IE 



i=l j=l 

where U is the total load on machine i. Note that the second term is independent of the schedule. This cost 
is minimised when k = Ih for all i,h and this can be realised if a perfect 3-partition exists. Let us denote 
this minimum by OPT3P. 

If no perfect 3-partition exists, then any schedule where no jobs are split has strictly higher cost than 
OPT3P. It remains to prove that also any schedule with at least one split job has a strictly higher cost than 

OPT.jp. 

First observe that 

3n 



i=l 
„2 



6ns + 0{ns). 



Now assume that at least one job is split, then there are at least 3n + 1 setup times of s each. Consider the 
extreme case where all 3n values o,- are zero. In this case it is easy to see that the weighted sum of the 3n 



completion times is at least (6n + l)s^. 
enough s we have (6n + > Optsp. 



Clearly, this bound holds as well for arbitrary value aj 



For large 
□ 



Theorem 4. The problem P2|s, split] X^wjCj is weakly NP-hard. 



Proof. We now reduce from a restricted form of the Subset Sum problem: Given 2n positive integers 



ai, . . . , a2n such that ai 



- a2n = 2^, is there a set / C {1, . . . , 2n} such that |/| = n and J2i 



iei' 



AI 



Given an instance of Subset Sum, we construct the following instance of our scheduling problem. We have 
2 machines and 2n jobs. Wc set pj = aj and Wj = aj + s for j = 1, . . . , 2n, where the setup time s is some 
large enough number, to be defined later. The proof follows the same reasoning as the previous proof: the 
large setup time will now make sure that exactly n jobs are scheduled (unsplit) per machine, and the weights 
will make sure that a schedule where the two machines complete at exactly the same time is optimal, if such 
a schedule is feasible. 

Suppose we schedule the jobs unsplit. Then, just as in the proof above for an arbitrary number of 
machines we have that the cost of the schedule is: 



1 



2n 
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where U is the total load on machine i. Note that the second term is independent of the schedule. This cost 
is minimised when li — I2 and this can be realised if a perfect subset / exists. Let us denote this minimum 
by Opts. 

If no perfect subset exists, any unsplit schedule has strictly higher cost. It remains to prove that also 
any schedule with at least one split has a strictly higher cost than Opts. 
First observe that 

^ 2n 

Opts = {ns + + -^{s + a^f 
= (n^ + n)s^ + 0(ns). 

Now assume that at least one job is split, then there are at least 2n + 1 setup times of s each. Consider the 
extreme case where all 2n values aj are zero. In this case it is easy to see that the weighted sum of the 2n 
completion times is at least (n^ + n + l)s^. Clearly, this bound holds as well for arbitrary values aj. For 
large enough s we have (n^ + n + > Opts. □ 

8 Epilogue 

In the following table we gather the state of the art on scheduling problems with job splitting and uniform 
setup times. For describing the problems in the first column of the table we use the standard three-field 
scheduling notation [^. In the first field, expressing the processor environment, we only consider parallel 
identical machines, denoted by P, possibly with the number of parallel machines mentioned additionally. In 
the second field, expressing job characteristics, the term pmtn denotes ordinary preemption, split denotes 
job splitting as we consider in this paper and s denotes the presence of uniform setup times. Though this 
paper is mainly concerned with problems with a total completion time objective, indicated by ^ Cj in the 
third field, expressing the objective, we will also show the state of the art on the total weighted completion 
time (indicated by ^ wjCj) and on the makespan (indicated by Cmax)- 

In the second column, we summarize the complexity status of these problems. A question mark indicates 
that the complexity of the problem is unknown. In the third column we give the best approximation guarantee 
known, where a '-' indicates that no algorithm with a performance guarantee is known. If we consider it 
relevant, we also present, as a footnote, the knowledge on the comparable version with preemption instead 
of splitting. 
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P I split I Cj in P 

P2 I s, split I Cj in P 

P I s, split I E Q ? 

cf. P I s^pmtn I ^ Oj in P 



divide jobs equally over the 
machines in SPT order 

algorithm of Section [3] 

2.781-approx. in Section [5] 

SPT 



P I split I Y^jCj in P 

cf. P I pmtn I J2wjCj NP-hard [2] 

P I s, split I ^«jCj NP-hard 
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divide jobs equally over the 
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P 1 S, Sp/it 1 C„a:r 
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|-approx. 


split/assignment [3] 
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